Daniela’s Notes for Nicemboim, Schad, & Vasishth (2024)
  • D. Palleschi
  1. Book exercises
  2. 5  Ch. 8 Exercises
  • 1  workbook
  • Book notes
    • 2  Chapter notes
    • 3  Contrast Coding
  • Book exercises
    • 4  Ch. 1 Exercises
    • 5  Ch. 8 Exercises
  • Literaturverzeichnis

Chatper contents

  • 5.1 Exercise 8.1 Contrast coding for a four-condition design
    • 5.1.1 Questions
  • 5.2 Exercise 8.2 Helmert coding for a four-condition design.
    • 5.2.1 RQ1: dif between positive and negative
    • 5.2.2 RQ2: Grammaticality diff’s within negative (cond a vs b+c)
    • 5.2.3 RQ3: Ungramm vs. ungramm within negative (cond b vs. c)
    • 5.2.4 RQ4: Grammaticality diff’s within positive (cond d vs. e + f)
    • 5.2.5 RQ5: Gramm vs. gramm within positive (cond e vs. f)
  • 5.3 Exercise 8.3 Number of possible comparisons in a single model.
  1. Book exercises
  2. 5  Ch. 8 Exercises

5  Ch. 8 Exercises

pacman::p_load(
  tidyverse,
  brms,
  here,
  Rmisc
)
summarise <- dplyr::summarise

5.1 Exercise 8.1 Contrast coding for a four-condition design

Load the following data. These data are from Experiment 1 in a set of reading studies on Persian (Safavi, Husain, and Vasishth 2016). This is a self-paced reading study on particle-verb constructions, with a 2 x 2 design: distance (short, long) and predictability (predictable, unpredictable). The data are from a critical region in the sentence. All the data from the Safavi, Husain, and Vasishth (2016) paper are available from https://github.com/vasishth/SafaviEtAl2016.

library(bcogsci)
data("df_persianE1")
dat1 <- df_persianE1
head(dat1)
    subj item   rt distance   predability
60     4    6  568    short   predictable
94     4   17  517     long unpredictable
146    4   22  675    short   predictable
185    4    5  575     long unpredictable
215    4    3  581     long   predictable
285    4    7 1171     long   predictable

The four conditions are:

  • Distance=short and Predictability=unpredictable
  • Distance=short and Predictability=predictable
  • Distance=long and Predictability=unpredictable
  • Distance=long and Predictability=predictable

The researcher wants to do the following sets of comparisons between condition means:

Compare the condition labeled Distance=short and Predictability=unpredictable with each of the following conditions:

  • Distance=short and Predictability=predictable
  • Distance=long and Predictability=unpredictable
  • Distance=long and Predictability=predictable

5.1.1 Questions

  1. Which contrast coding is needed for such a comparison?

Treatment contrasts, because we’re comparing everything to on baseline condition.

dat1$distance <- as.factor(dat1$distance)
dat1$predability <- as.factor(dat1$predability)
levels(dat1$distance)
[1] "long"  "short"
levels(dat1$predability)
[1] "predictable"   "unpredictable"

Each have 2 levels (2x2 design).

Desired comparisons:

  • short-unpredictable vs. short-predictable

  • short-unpredictable vs. long-predictable

  • short-unpredictable vs. long-unpredictable

  • we do not care about comparing short and unpredictable

  • so we want treatment contrasts, with short-predictable as our baseline

  1. First, define the relevant contrast coding. Hint: You can do it by creating a condition column labeled a,b,c,d and then use a built-in contrast coding function.

Create variable ‘cond’

dat1 <- dat1 |> 
  mutate(cond = paste(distance, predability, sep = "-"),
         cond = fct_relevel(cond, "short-unpredictable", "short-predictable"))

Set treatment contrasts

dat1$cond <- as.factor(dat1$cond)
contrasts(dat1$cond) <- contr.treatment(4)

Print contrast matrix

contrasts(dat1$cond)
                    2 3 4
long-predictable    0 0 0
long-unpredictable  1 0 0
short-predictable   0 1 0
short-unpredictable 0 0 1

So cond1 will be a comparisons between short-unpred and long-pred, etc.

Let’s also look at the means per condition.

dat1 |> 
  Rmisc::summarySEwithin(
    measurevar = "rt", withinvars = c("distance", "predability")
  ) |> 
  knitr::kable()
distance predability N rt sd se ci
long predictable 378 577.6217 441.0407 22.68469 44.60436
long unpredictable 378 645.5847 468.6011 24.10224 47.39166
short predictable 378 535.7302 321.3159 16.52671 32.49608
short unpredictable 378 575.8413 462.0511 23.76534 46.72924
  1. Then, use the hypr library function to confirm that your contrast coding actually does the comparison you need.

We’ll need to define our 3 comparisons:

library(hypr)
levels(dat1$cond)
[1] "long-predictable"    "long-unpredictable"  "short-predictable"  
[4] "short-unpredictable"
cond_treat <-
  hypr(
   b0 = `short-unpredictable` ~ 0, # include intercept
   b1 = `long-predictable` ~ `short-unpredictable`,
   b2 = `long-unpredictable` ~ `short-unpredictable`,
   b3 = `short-predictable` ~ `short-unpredictable`,
   levels = c("short-unpredictable", "short-predictable", "long-predictable", "long-unpredictable")
 )

cond_treat
hypr object containing 4 null hypotheses:
H0.b0: 0 = short-unpredictable                       (Intercept)
H0.b1: 0 = long-predictable - short-unpredictable
H0.b2: 0 = long-unpredictable - short-unpredictable
H0.b3: 0 = short-predictable - short-unpredictable

Call:
hypr(b0 = ~`short-unpredictable`, b1 = ~`long-predictable` - 
    `short-unpredictable`, b2 = ~`long-unpredictable` - `short-unpredictable`, 
    b3 = ~`short-predictable` - `short-unpredictable`, levels = c("short-unpredictable", 
    "short-predictable", "long-predictable", "long-unpredictable"
    ))

Hypothesis matrix (transposed):
                    b0 b1 b2 b3
short-unpredictable  1 -1 -1 -1
short-predictable    0  0  0  1
long-predictable     0  1  0  0
long-unpredictable   0  0  1  0

Contrast matrix:
                    b0 b1 b2 b3
short-unpredictable 1  0  0  0 
short-predictable   1  0  0  1 
long-predictable    1  1  0  0 
long-unpredictable  1  0  1  0 

Check out hypothesis matrix

contr.hypothesis(cond_treat)
                    b1 b2 b3
short-unpredictable  0  0  0
short-predictable    0  0  1
long-predictable     1  0  0
long-unpredictable   0  1  0
attr(,"class")
[1] "hypr_cmat" "matrix"    "array"    

Set contrasts.

contrasts(dat1$cond) <- contr.hypothesis(cond_treat)

Print contrasts

contrasts(dat1$cond)
                    b1 b2 b3
short-unpredictable 0  0  0 
short-predictable   0  0  1 
long-predictable    1  0  0 
long-unpredictable  0  1  0 
  1. Fit a simple linear model with the above contrast coding and display the slopes, which constitute the relevant comparisons.
fit_dat1 <- brm(rt ~ 1 + cond,
  data = dat1,
  family = gaussian(),
  # prior = c(
  #   prior(normal(550, 25), class = Intercept),
  #   prior(normal(25, 2), class = sigma),
  #   prior(normal(25, 1), class = b)
  # ),
  file = here::here("models/exercises/ch8/fit_ch_8_ex1")
)
fixef(fit_dat1)
            Estimate Est.Error      Q2.5     Q97.5
Intercept 575.475897  19.52131 537.22261 614.16236
condb1      1.926428  27.28125 -51.72818  56.24605
condb2     69.867350  27.19697  15.34477 124.10026
condb3    -39.888746  26.46207 -91.86960  14.17325

Calculate the estimates per condition:

# intercept
short_unpred <- fixef(fit_dat1)[1,"Estimate"]
long_pred <- short_unpred + fixef(fit_dat1)[2,"Estimate"]
long_unpred <- short_unpred + fixef(fit_dat1)[3,"Estimate"]
short_pred <- short_unpred + fixef(fit_dat1)[4,"Estimate"]
short_unpred
[1] 575.4759
long_pred
[1] 577.4023
long_unpred
[1] 645.3432
short_pred
[1] 535.5872
contrasts(dat1$cond)
                    b1 b2 b3
short-unpredictable 0  0  0 
short-predictable   0  0  1 
long-predictable    1  0  0 
long-unpredictable  0  1  0 
  1. Now, compute each of the four conditions’ means and check that the slopes from the linear model correspond to the relevant differences between means that you obtained from the data.
dat1 |> 
  Rmisc::summarySEwithin(
    measurevar = "rt", withinvars = c("distance", "predability")
  ) |> 
  knitr::kable()
distance predability N rt sd se ci
long predictable 378 577.6217 441.0407 22.68469 44.60436
long unpredictable 378 645.5847 468.6011 24.10224 47.39166
short predictable 378 535.7302 321.3159 16.52671 32.49608
short unpredictable 378 575.8413 462.0511 23.76534 46.72924

Or with the tidyverse:

dat1 |> 
  dplyr::summarise(mean = mean(rt),
            .by = cond)
                 cond     mean
1   short-predictable 535.7302
2  long-unpredictable 645.5847
3    long-predictable 577.6217
4 short-unpredictable 575.8413

Yup they correspond.

5.2 Exercise 8.2 Helmert coding for a four-condition design.

Load the following data:

library(bcogsci)
data("df_polarity")
head(df_polarity)
  subject item condition times   value
1       1    6         f   SFD 327.845
2       1   24         f   SFD 205.948
3       1   35         e   SFD 315.225
4       1   17         e   SFD 264.773
5       1   34         d   SFD 252.193
6       1    7         a   SFD 155.511

The data come from an eyetracking study in German reported in Vasishth et al. (2008). The experiment is a reading study involving six conditions. The sentences are in English, but the original design was involved German sentences. In German, the word durchaus (certainly) is a positive polarity item: in the constructions used in this experiment, durchaus cannot have a c-commanding element that is a negative polarity item licensor. Here are the conditions:

  • Negative polarity items
  1. Grammatical: No man who had a beard was ever thrifty.
  2. Ungrammatical (Intrusive NPI licensor): A man who had no beard was ever thrifty.
  3. Ungrammatical: A man who had a beard was ever thrifty.
  • Positive polarity items
  1. Ungrammatical: No man who had a beard was certainly thrifty.
  2. Grammatical (Intrusive NPI licensor): A man who had no beard was certainly thrifty.
  3. Grammatical: A man who had a beard was certainly thrifty.

We will focus only on re-reading time in this data set. Subset the data so that we only have re-reading times in the data frame:

dat2 <- subset(df_polarity, times == "RRT")
head(dat2)
     subject item condition times    value
6365       1   20         b   RRT  239.571
6366       1    3         c   RRT 1866.169
6367       1   13         a   RRT  529.576
6368       1   19         a   RRT  269.002
6369       1   27         c   RRT  844.770
6370       1   26         b   RRT  634.654

The comparisons we are interested in are: > - RQ1: What is the difference in reading time between negative polarity items and positive polarity items? - RQ2: Within negative polarity items, what is the difference between grammatical and ungrammatical conditions? - RQ3: Within negative polarity items, what is the difference between the two ungrammatical conditions? - RQ4: Within positive polarity items, what is the difference between grammatical and ungrammatical conditions? - RQ5: Within positive polarity items, what is the difference between the two grammatical conditions? > Use the hypr package to specify the comparisons specified above, and then extract the contrast matrix.

My outline of our effects/comparisons of interest:

  • a main effect of polarity (positive vs negative)
  • nested effect of grammaticality within either level of polarity
  • effect of intrusivity within either polarity level
levels(dat2$condition)
[1] "a" "b" "c" "d" "e" "f"

To help me: create predictor factors.

dat2 <-
  dat2 |> 
  mutate(polarity = ifelse(condition %in% c("a", "b", "c"), "negative", "positive"),
         gramm = ifelse(condition %in% c("a", "e", "f"), "gramm", "ungramm"),
         intru = ifelse(condition %in% c("b", "e"), "intrusive", "unintrusive"))
dat2 |> 
  distinct(condition, .keep_all = T) |> 
  arrange(condition)
  subject item condition times    value polarity   gramm       intru
1       1   13         a   RRT  529.576 negative   gramm unintrusive
2       1   20         b   RRT  239.571 negative ungramm   intrusive
3       1    3         c   RRT 1866.169 negative ungramm unintrusive
4       1    4         d   RRT  332.036 positive ungramm unintrusive
5       1   11         e   RRT  806.971 positive   gramm   intrusive
6       2   29         f   RRT  319.430 positive   gramm unintrusive
cond_dat2 <-
  hypr(
   rq1 =  (a + b + c) /3~ (d + e + f) /3,
   rq2 = a ~ (b + c) / 2,
   rq3 = b ~ c,
   rq4 = d ~ (e+f)/2,
   rq5 = e ~ f,
   levels = c("a", "b", "c", "d", "e", "f")
 )

cond_dat2
hypr object containing 5 null hypotheses:
H0.rq1: 0 = (a + b + c - d - e - f)/3
H0.rq2: 0 = a - 1/2*b - 1/2*c
H0.rq3: 0 = b - c
H0.rq4: 0 = d - 1/2*e - 1/2*f
H0.rq5: 0 = e - f

Call:
hypr(rq1 = ~1/3 * a + 1/3 * b + 1/3 * c - 1/3 * d - 1/3 * e - 
    1/3 * f, rq2 = ~a - 1/2 * b - 1/2 * c, rq3 = ~b - c, rq4 = ~d - 
    1/2 * e - 1/2 * f, rq5 = ~e - f, levels = c("a", "b", "c", 
"d", "e", "f"))

Hypothesis matrix (transposed):
  rq1  rq2  rq3  rq4  rq5 
a  1/3    1    0    0    0
b  1/3 -1/2    1    0    0
c  1/3 -1/2   -1    0    0
d -1/3    0    0    1    0
e -1/3    0    0 -1/2    1
f -1/3    0    0 -1/2   -1

Contrast matrix:
  rq1  rq2  rq3  rq4  rq5 
a  1/2  2/3    0    0    0
b  1/2 -1/3  1/2    0    0
c  1/2 -1/3 -1/2    0    0
d -1/2    0    0  2/3    0
e -1/2    0    0 -1/3  1/2
f -1/2    0    0 -1/3 -1/2

Finally, specify the contrasts to the condition column in the data frame.

contrasts(dat2$condition) <- contr.hypothesis(cond_dat2)
contrasts(dat2$condition)
  rq1  rq2  rq3  rq4  rq5 
a  1/2  2/3    0    0    0
b  1/2 -1/3  1/2    0    0
c  1/2 -1/3 -1/2    0    0
d -1/2    0    0  2/3    0
e -1/2    0    0 -1/3  1/2
f -1/2    0    0 -1/3 -1/2

Fit a linear model using this contrast specification, and then check that the estimates from the model match the mean differences between the conditions being compared.

fit_dat2 <- brm(value ~ 1 + condition,
  data = dat2,
  family = gaussian(),
  # prior = c(
  #   prior(normal(550, 25), class = Intercept),
  #   prior(normal(25, 2), class = sigma),
  #   prior(normal(25, 1), class = b)
  # ),
  file = here::here("models/exercises/ch8/fit_ch_8_ex2")
)
fixef(fit_dat2)
               Estimate Est.Error       Q2.5     Q97.5
Intercept     510.79417  17.39399  477.20865 544.30347
conditionrq1  153.45661  35.39835   85.69990 223.12523
conditionrq2 -152.05606  52.45486 -251.79242 -52.02678
conditionrq3  -25.59221  56.49876 -136.06084  84.89980
conditionrq4  141.53725  55.53767   34.55437 249.76384
conditionrq5   34.69203  62.82869  -91.78247 158.63615

Calculate the estimates per condition:

contrasts(dat2$condition)
  rq1  rq2  rq3  rq4  rq5 
a  1/2  2/3    0    0    0
b  1/2 -1/3  1/2    0    0
c  1/2 -1/3 -1/2    0    0
d -1/2    0    0  2/3    0
e -1/2    0    0 -1/3  1/2
f -1/2    0    0 -1/3 -1/2
# intercept
intercept <- fixef(fit_dat2)["Intercept","Estimate"]

5.2.1 RQ1: dif between positive and negative

Raw diff’s.

dat2 |> 
  summarise(mean = mean(value),
            .by = polarity) |> 
  pivot_wider(names_from = polarity, values_from = mean) |> 
  mutate(difference = negative - positive)
# A tibble: 1 × 3
  negative positive difference
     <dbl>    <dbl>      <dbl>
1     592.     440.       152.
positive <- intercept + fixef(fit_dat2)["conditionrq1","Estimate"]*-0.5
negative <- intercept + fixef(fit_dat2)["conditionrq1","Estimate"]*0.5
rq1 <- negative - positive

Estimated diff.

rq1
[1] 153.4566

Slope value.

fixef(fit_dat2)["conditionrq1","Estimate"]
[1] 153.4566

5.2.2 RQ2: Grammaticality diff’s within negative (cond a vs b+c)

Raw diff’s.

dat2 |> 
  filter(polarity == "negative") |> 
  summarise(mean = mean(value),
            .by = gramm) |> 
  pivot_wider(names_from = gramm, values_from = mean) |> 
  mutate(difference = gramm - ungramm)
# A tibble: 1 × 3
  ungramm gramm difference
    <dbl> <dbl>      <dbl>
1    639.  487.      -152.
est_gramm <- intercept + fixef(fit_dat2)["conditionrq2","Estimate"]*0.5
est_ungramm <- intercept + fixef(fit_dat2)["conditionrq2","Estimate"]*-0.5
rq2 <- est_gramm - est_ungramm

Estimated diff.

rq2
[1] -152.0561

Slope value.

fixef(fit_dat2)["conditionrq2","Estimate"]
[1] -152.0561

5.2.3 RQ3: Ungramm vs. ungramm within negative (cond b vs. c)

Raw diff’s.

dat2 |> 
  summarise(mean = mean(value),
            .by = condition) |> 
  filter(condition %in% c("b", "c")) |> 
  pivot_wider(names_from = condition, values_from = mean) |> 
  mutate(difference = b - c)
# A tibble: 1 × 3
      b     c difference
  <dbl> <dbl>      <dbl>
1  626.  652.      -25.4
est_b <- intercept + fixef(fit_dat2)["conditionrq3","Estimate"]*0.5
est_c <- intercept + fixef(fit_dat2)["conditionrq3","Estimate"]*-0.5
rq3 <- est_b - est_c

Estimated diff.

rq3
[1] -25.59221

Slope value.

fixef(fit_dat2)["conditionrq3","Estimate"]
[1] -25.59221

5.2.4 RQ4: Grammaticality diff’s within positive (cond d vs. e + f)

Raw diff’s.

dat2 |> 
  filter(polarity == "positive") |> 
  summarise(mean = mean(value),
            .by = gramm) |> 
  pivot_wider(names_from = gramm, values_from = mean) |> 
  mutate(difference = gramm - ungramm)
# A tibble: 1 × 3
  ungramm gramm difference
    <dbl> <dbl>      <dbl>
1    529.  389.      -140.
est_gramm <- intercept + fixef(fit_dat2)["conditionrq4","Estimate"]*0.5
est_ungramm <- intercept + fixef(fit_dat2)["conditionrq4","Estimate"]*-0.5
rq4 <- est_gramm - est_ungramm

Estimated diff.

rq4
[1] 141.5373

Slope value.

fixef(fit_dat2)["conditionrq4","Estimate"]
[1] 141.5373

5.2.5 RQ5: Gramm vs. gramm within positive (cond e vs. f)

Raw diff’s.

dat2 |> 
  summarise(mean = mean(value),
            .by = condition) |> 
  filter(condition %in% c("e", "f")) |> 
  pivot_wider(names_from = condition, values_from = mean) |> 
  mutate(difference = e - f)
# A tibble: 1 × 3
      e     f difference
  <dbl> <dbl>      <dbl>
1  405.  371.       34.0
est_e <- intercept + fixef(fit_dat2)["conditionrq5","Estimate"]*0.5
est_f <- intercept + fixef(fit_dat2)["conditionrq5","Estimate"]*-0.5
rq5 <- est_e - est_f

Estimated diff.

rq5
[1] 34.69203

Slope value.

fixef(fit_dat2)["conditionrq5","Estimate"]
[1] 34.69203

5.3 Exercise 8.3 Number of possible comparisons in a single model.

How many comparisons can one make in a single model when there is a single factor with four levels? Why can we not code four comparisons in a single model?

1 v (2,3,4) (1,2) v. (3,4) (1,2,3) v. 4 1 v 2 1 v 3 1 v 4

3 v. (3)

How many comparisons can one code in a model where there are two factors, one with three levels and one with two levels?

Six conditions, two main effects.

How about a model for a 2 x 2 design?

Four conditions, two main effects. - compare main effects (x1) - compare nested effects for pred1/pred2 (x2) + and pred2/pred1 (x2)

4  Ch. 1 Exercises
Literaturverzeichnis
Quellcode
---
title: "Ch. 8 Exercises"
---

```{r}
pacman::p_load(
  tidyverse,
  brms,
  here,
  Rmisc
)
summarise <- dplyr::summarise
```

## **Exercise 8.1** Contrast coding for a four-condition design

> Load the following data. These data are from Experiment 1 in a set of reading studies on Persian (Safavi, Husain, and Vasishth 2016). This is a self-paced reading study on particle-verb constructions, with a  2 x 2 design: distance (short, long) and predictability (predictable, unpredictable). The data are from a critical region in the sentence. All the data from the Safavi, Husain, and Vasishth (2016) paper are available from https://github.com/vasishth/SafaviEtAl2016.

```{r}
library(bcogsci)
data("df_persianE1")
dat1 <- df_persianE1
head(dat1)
```

> The four conditions are:
>
- Distance=short and Predictability=unpredictable
- Distance=short and Predictability=predictable
- Distance=long and Predictability=unpredictable
- Distance=long and Predictability=predictable
>
> The researcher wants to do the following sets of comparisons between condition means:
>
> Compare the condition labeled Distance=short and Predictability=unpredictable with each of the following conditions:
> 
- Distance=short and Predictability=predictable
- Distance=long and Predictability=unpredictable
- Distance=long and Predictability=predictable

### Questions

> 1. Which contrast coding is needed for such a comparison?

Treatment contrasts, because we're comparing everything to on baseline condition.

```{r}
dat1$distance <- as.factor(dat1$distance)
dat1$predability <- as.factor(dat1$predability)
```

```{r}
levels(dat1$distance)
levels(dat1$predability)
```

Each have 2 levels (2x2 design).

Desired comparisons:

- short-unpredictable vs. short-predictable
- short-unpredictable vs. long-predictable
- short-unpredictable vs. long-unpredictable

- we do *not* care about comparing short and unpredictable
- so we want treatment contrasts, with short-predictable as our baseline


> 2. First, define the relevant contrast coding. Hint: You can do it by creating a condition column labeled a,b,c,d and then use a built-in contrast coding function.

Create variable 'cond'

```{r}
dat1 <- dat1 |> 
  mutate(cond = paste(distance, predability, sep = "-"),
         cond = fct_relevel(cond, "short-unpredictable", "short-predictable"))
```

Set treatment contrasts

```{r}
dat1$cond <- as.factor(dat1$cond)
contrasts(dat1$cond) <- contr.treatment(4)
```

Print contrast matrix

```{r}
contrasts(dat1$cond)
```

So cond1 will be a comparisons between short-unpred and long-pred, etc.

Let's also look at the means per condition.

```{r}
dat1 |> 
  Rmisc::summarySEwithin(
    measurevar = "rt", withinvars = c("distance", "predability")
  ) |> 
  knitr::kable()
```


> 3. Then, use the hypr library function to confirm that your contrast coding actually does the comparison you need.

We'll need to define our 3 comparisons:

```{r}
library(hypr)
```


```{r}
levels(dat1$cond)
```


```{r}
cond_treat <-
  hypr(
   b0 = `short-unpredictable` ~ 0, # include intercept
   b1 = `long-predictable` ~ `short-unpredictable`,
   b2 = `long-unpredictable` ~ `short-unpredictable`,
   b3 = `short-predictable` ~ `short-unpredictable`,
   levels = c("short-unpredictable", "short-predictable", "long-predictable", "long-unpredictable")
 )

cond_treat
```

Check out hypothesis matrix

```{r}
contr.hypothesis(cond_treat)
```


Set contrasts.

```{r}
contrasts(dat1$cond) <- contr.hypothesis(cond_treat)
```

Print contrasts

```{r}
contrasts(dat1$cond)
```


> 4. Fit a simple linear model with the above contrast coding and display the slopes, which constitute the relevant comparisons.

```{r}
fit_dat1 <- brm(rt ~ 1 + cond,
  data = dat1,
  family = gaussian(),
  # prior = c(
  #   prior(normal(550, 25), class = Intercept),
  #   prior(normal(25, 2), class = sigma),
  #   prior(normal(25, 1), class = b)
  # ),
  file = here::here("models/exercises/ch8/fit_ch_8_ex1")
)
```

```{r}
fixef(fit_dat1)
```

Calculate the estimates per condition:

```{r}
# intercept
short_unpred <- fixef(fit_dat1)[1,"Estimate"]
long_pred <- short_unpred + fixef(fit_dat1)[2,"Estimate"]
long_unpred <- short_unpred + fixef(fit_dat1)[3,"Estimate"]
short_pred <- short_unpred + fixef(fit_dat1)[4,"Estimate"]

```

```{r}
short_unpred
long_pred
long_unpred
short_pred
```


```{r}
contrasts(dat1$cond)
```


> 5. Now, compute each of the four conditions’ means and check that the slopes from the linear model correspond to the relevant differences between means that you obtained from the data.

```{r}
dat1 |> 
  Rmisc::summarySEwithin(
    measurevar = "rt", withinvars = c("distance", "predability")
  ) |> 
  knitr::kable()
```

Or with the tidyverse:

```{r}
dat1 |> 
  dplyr::summarise(mean = mean(rt),
            .by = cond)
```

Yup they correspond.

## **Exercise 8.2** Helmert coding for a four-condition design.

> Load the following data:
>
```{r}
library(bcogsci)
data("df_polarity")
head(df_polarity)
```

> The data come from an eyetracking study in German reported in Vasishth et al. (2008). The experiment is a reading study involving six conditions. The sentences are in English, but the original design was involved German sentences. In German, the word durchaus (certainly) is a positive polarity item: in the constructions used in this experiment, durchaus cannot have a c-commanding element that is a negative polarity item licensor. Here are the conditions:
> 
- Negative polarity items
    a. Grammatical: No man who had a beard was ever thrifty.
    b. Ungrammatical (Intrusive NPI licensor): A man who had no beard was ever thrifty.
    c. Ungrammatical: A man who had a beard was ever thrifty.
- Positive polarity items
    e. Ungrammatical: No man who had a beard was certainly thrifty.
    f. Grammatical (Intrusive NPI licensor): A man who had no beard was certainly thrifty.
    g. Grammatical: A man who had a beard was certainly thrifty.
>
We will focus only on re-reading time in this data set. Subset the data so that we only have re-reading times in the data frame:

```{r}
dat2 <- subset(df_polarity, times == "RRT")
head(dat2)
```

The comparisons we are interested in are:
>
- RQ1: What is the difference in reading time between negative polarity items and positive polarity items?
- RQ2: Within negative polarity items, what is the difference between grammatical and ungrammatical conditions?
- RQ3: Within negative polarity items, what is the difference between the two ungrammatical conditions?
- RQ4: Within positive polarity items, what is the difference between grammatical and ungrammatical conditions?
- RQ5: Within positive polarity items, what is the difference between the two grammatical conditions?
>
Use the `hypr` package to specify the comparisons specified above, and then extract the contrast matrix. 

My outline of our effects/comparisons of interest:

- a main effect of polarity (positive vs negative)
- nested effect of grammaticality within either level of polarity
- effect of intrusivity within either polarity level

```{r}
levels(dat2$condition)
```

To help me: create predictor factors.

```{r}
dat2 <-
  dat2 |> 
  mutate(polarity = ifelse(condition %in% c("a", "b", "c"), "negative", "positive"),
         gramm = ifelse(condition %in% c("a", "e", "f"), "gramm", "ungramm"),
         intru = ifelse(condition %in% c("b", "e"), "intrusive", "unintrusive"))
```

```{r}
dat2 |> 
  distinct(condition, .keep_all = T) |> 
  arrange(condition)
```


```{r}
cond_dat2 <-
  hypr(
   rq1 =  (a + b + c) /3~ (d + e + f) /3,
   rq2 = a ~ (b + c) / 2,
   rq3 = b ~ c,
   rq4 = d ~ (e+f)/2,
   rq5 = e ~ f,
   levels = c("a", "b", "c", "d", "e", "f")
 )

cond_dat2
```

> Finally, specify the contrasts to the condition column in the data frame. 


```{r}
contrasts(dat2$condition) <- contr.hypothesis(cond_dat2)
```

```{r}
contrasts(dat2$condition)
```

> Fit a linear model using this contrast specification, and then check that the estimates from the model match the mean differences between the conditions being compared.

```{r}
fit_dat2 <- brm(value ~ 1 + condition,
  data = dat2,
  family = gaussian(),
  # prior = c(
  #   prior(normal(550, 25), class = Intercept),
  #   prior(normal(25, 2), class = sigma),
  #   prior(normal(25, 1), class = b)
  # ),
  file = here::here("models/exercises/ch8/fit_ch_8_ex2")
)
```

```{r}
fixef(fit_dat2)
```

Calculate the estimates per condition:

```{r}
contrasts(dat2$condition)
```


```{r}
# intercept
intercept <- fixef(fit_dat2)["Intercept","Estimate"]
```

### RQ1: dif between positive and negative

Raw diff's.

```{r}
dat2 |> 
  summarise(mean = mean(value),
            .by = polarity) |> 
  pivot_wider(names_from = polarity, values_from = mean) |> 
  mutate(difference = negative - positive)
```


```{r}
positive <- intercept + fixef(fit_dat2)["conditionrq1","Estimate"]*-0.5
negative <- intercept + fixef(fit_dat2)["conditionrq1","Estimate"]*0.5
rq1 <- negative - positive
```

Estimated diff.

```{r}
rq1
```

Slope value.

```{r}
fixef(fit_dat2)["conditionrq1","Estimate"]
```



### RQ2: Grammaticality diff's within negative (cond a vs b+c)

Raw diff's.

```{r}
dat2 |> 
  filter(polarity == "negative") |> 
  summarise(mean = mean(value),
            .by = gramm) |> 
  pivot_wider(names_from = gramm, values_from = mean) |> 
  mutate(difference = gramm - ungramm)
```


```{r}
est_gramm <- intercept + fixef(fit_dat2)["conditionrq2","Estimate"]*0.5
est_ungramm <- intercept + fixef(fit_dat2)["conditionrq2","Estimate"]*-0.5
rq2 <- est_gramm - est_ungramm
```

Estimated diff.

```{r}
rq2
```

Slope value.

```{r}
fixef(fit_dat2)["conditionrq2","Estimate"]
```

### RQ3: Ungramm vs. ungramm within negative (cond b vs. c)

Raw diff's.

```{r}
dat2 |> 
  summarise(mean = mean(value),
            .by = condition) |> 
  filter(condition %in% c("b", "c")) |> 
  pivot_wider(names_from = condition, values_from = mean) |> 
  mutate(difference = b - c)
```


```{r}
est_b <- intercept + fixef(fit_dat2)["conditionrq3","Estimate"]*0.5
est_c <- intercept + fixef(fit_dat2)["conditionrq3","Estimate"]*-0.5
rq3 <- est_b - est_c
```

Estimated diff.

```{r}
rq3
```

Slope value.

```{r}
fixef(fit_dat2)["conditionrq3","Estimate"]
```

### RQ4: Grammaticality diff's within positive (cond d vs. e + f)


Raw diff's.

```{r}
dat2 |> 
  filter(polarity == "positive") |> 
  summarise(mean = mean(value),
            .by = gramm) |> 
  pivot_wider(names_from = gramm, values_from = mean) |> 
  mutate(difference = gramm - ungramm)
```


```{r}
est_gramm <- intercept + fixef(fit_dat2)["conditionrq4","Estimate"]*0.5
est_ungramm <- intercept + fixef(fit_dat2)["conditionrq4","Estimate"]*-0.5
rq4 <- est_gramm - est_ungramm
```

Estimated diff.

```{r}
rq4
```

Slope value.

```{r}
fixef(fit_dat2)["conditionrq4","Estimate"]
```

### RQ5: Gramm vs. gramm within positive (cond e vs. f)

Raw diff's.

```{r}
dat2 |> 
  summarise(mean = mean(value),
            .by = condition) |> 
  filter(condition %in% c("e", "f")) |> 
  pivot_wider(names_from = condition, values_from = mean) |> 
  mutate(difference = e - f)
```


```{r}
est_e <- intercept + fixef(fit_dat2)["conditionrq5","Estimate"]*0.5
est_f <- intercept + fixef(fit_dat2)["conditionrq5","Estimate"]*-0.5
rq5 <- est_e - est_f
```

Estimated diff.

```{r}
rq5
```

Slope value.

```{r}
fixef(fit_dat2)["conditionrq5","Estimate"]
```


## **Exercise 8.3** Number of possible comparisons in a single model.

> How many comparisons can one make in a single model when there is a single factor with four levels? Why can we not code four comparisons in a single model?

1 v (2,3,4)
(1,2) v. (3,4)
(1,2,3) v. 4
1 v 2
1 v 3
1 v 4

3 v. (3)

> How many comparisons can one code in a model where there are two factors, one with three levels and one with two levels?

Six conditions, two main effects.

> How about a model for a  2 x 2 design?

Four conditions, two main effects.
- compare main effects (x1)
- compare nested effects for pred1/pred2 (x2)
  + and pred2/pred1 (x2)